Building Dual AI Models and Nomograms Using Noninvasive Parameters for Aiding Male Bladder Outlet Obstruction Diagnosis and Minimizing the Need for Invasive Video-Urodynamic Studies: Development and Validation Study

Background Diagnosing underlying causes of nonneurogenic male lower urinary tract symptoms associated with bladder outlet obstruction (BOO) is challenging. Video-urodynamic studies (VUDS) and pressure-flow studies (PFS) are both invasive diagnostic methods for BOO. VUDS can more precisely differentiate etiologies of male BOO, such as benign prostatic obstruction, primary bladder neck obstruction, and dysfunctional voiding, potentially outperforming PFS. Objective These examinations’ invasive nature highlights the need for developing noninvasive predictive models to facilitate BOO diagnosis and reduce the necessity for invasive procedures. Methods We conducted a retrospective study with a cohort of men with medication-refractory, nonneurogenic lower urinary tract symptoms suspected of BOO who underwent VUDS from 2001 to 2022. In total, 2 BOO predictive models were developed—1 based on the International Continence Society’s definition (International Continence Society–defined bladder outlet obstruction; ICS-BOO) and the other on video-urodynamic studies–diagnosed bladder outlet obstruction (VBOO). The patient cohort was randomly split into training and test sets for analysis. A total of 6 machine learning algorithms, including logistic regression, were used for model development. During model development, we first performed development validation using repeated 5-fold cross-validation on the training set and then test validation to assess the model’s performance on an independent test set. Both models were implemented as paper-based nomograms and integrated into a web-based artificial intelligence prediction tool to aid clinical decision-making. Results Among 307 patients, 26.7% (n=82) met the ICS-BOO criteria, while 82.1% (n=252) were diagnosed with VBOO. The ICS-BOO prediction model had a mean area under the receiver operating characteristic curve (AUC) of 0.74 (SD 0.09) and mean accuracy of 0.76 (SD 0.04) in development validation and AUC and accuracy of 0.86 and 0.77, respectively, in test validation. The VBOO prediction model yielded a mean AUC of 0.71 (SD 0.06) and mean accuracy of 0.77 (SD 0.06) internally, with AUC and accuracy of 0.72 and 0.76, respectively, externally. When both models’ predictions are applied to the same patient, their combined insights can significantly enhance clinical decision-making and simplify the diagnostic pathway. By the dual-model prediction approach, if both models positively predict BOO, suggesting all cases actually resulted from medication-refractory primary bladder neck obstruction or benign prostatic obstruction, surgical intervention may be considered. Thus, VUDS might be unnecessary for 100 (32.6%) patients. Conversely, when ICS-BOO predictions are negative but VBOO predictions are positive, indicating varied etiology, VUDS rather than PFS is advised for precise diagnosis and guiding subsequent therapy, accurately identifying 51.1% (47/92) of patients for VUDS. Conclusions The 2 machine learning models predicting ICS-BOO and VBOO, based on 6 noninvasive clinical parameters, demonstrate commendable discrimination performance. Using the dual-model prediction approach, when both models predict positively, VUDS may be avoided, assisting in male BOO diagnosis and reducing the need for such invasive procedures.


I. Table S1. Performance of Model 1 and 2 Using Six Machine Learning Algorithms
• AUC represents the probability that a randomly chosen positive instance is ranked higher than a negative one, reflecting the overall performance of the model.

•
Sensitivity is the true positive rate, highlighting the model's detection capability.
• Specificity is the true negative rate, showing the model's ability to identify negatives.
• PPV is the proportion of positive test outcomes that are true positives, assessing precision.
• NPV is the proportion of negative test outcomes that are true negatives, showing true negative identification efficiency.The metrics assessed are area under the receiver operating characteristic curve (AUC), positive predictive value (PPV) and negative predictive value (NPV).The optimal threshold was determined by the Youden Index.The algorithms compared include logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting (GB), and extreme gradient boosting (XGB).This suggests that the dual model approach is fairly consistent with the actual diagnoses made using the ICS-BOO and VBOO criteria.Overall, the predictive models have demonstrated substantial agreement with the actual diagnoses, which could potentially streamline the diagnostic process for BOO.

V. Document S1: Medication-Refractory Male LUTS
Medication-refractory male LUTS is defined as a lack of symptomatic response to medical treatment over a continuous three-month period, a duration that aligns with clinical practice.
Given that alpha-blockers and antimuscarinics typically exhibit a rapid onset, with symptom relief often within a week if effective, a three-month timeline ensures that patients have had multiple outpatient clinic evaluations, allowing for adequate trials of various monotherapies or combination therapies.
All medications referenced are consistent with those outlined in Section 5.2 of the EAU guideline on pharmacological treatment.The specific medications and their ATC codes are detailed in the accompanying table, providing clarity on the pharmacological interventions considered.α1-blockers are usually considered the first-line drug treatment for male LUTS because of their rapid onset of action, good efficacy, and low rate and severity of adverse events [1].

Male
After adjusting various parameter combinations, the following settings were selected: (1) 'class_weight' set to 'balanced' to automatically equalize class weights, (2) the classification method set to 'ovr', employing a one-vs-rest strategy, and the performance metrics calculated using micro-averaging.
From the multiclass classification metrics for BOO, it is observed that most specificities and negative predictive values (NPVs) reach as high as 0.8, seemingly capable of accurately distinguishing and predicting patients without BOO.However, most sensitivities are below 0.5, indicating that the model fails to successfully detect the majority of patients with BOO.
Additionally, the positive predictive values (PPVs) do not exceed 0.6, suggesting that less than 60% of patients predicted to have BOO actually do, indicating a relatively poor outcome.
Another crucial metric, the highest F1 score achieved is only 0.53, and the overall microaverage accuracy of the model stands at 0.34.Clearly, the overall performance of the multiclass BOO prediction model is suboptimal, as detailed in Table S2-2

Figure S1 :
Figure S1: Calibration plot of ICS-BOO Prediction (Model1) Calibration plots, which visually compare predicted probabilities with observed outcomes, thus ensuring that the model's predictions align closely with actual risk.ICS-BOO: International Continence Society-defined BOO Document S2: Single Multiclass Prediction Model for VUDS diagnosis Although an initial attempt was made to develop a single AI model utilizing non-invasive parameters to predict the exact VUDS diagnosis (a 5-class classification), it resulted in poor accuracy Multiclass classification involves predicting multiple categories, representing the correct class for each sample.In this study, five categories of bladder outlet obstruction -BPO, PBNO, PRES, DV, and non-BOO -were evaluated for predictive performance.Due to the lower sample sizes for PRES and DV, the dataset exhibited class imbalance.To address this, we utilized the Logistic Regression function from sklearn to adjust class weights for the imbalanced dataset.Different parameter settings were experimented with, including setting the class weight to 'balanced' and 'None', choosing between 'ovr' (one-vs-rest) and

Table S1 .
Predictive Performance of Model 1 and 2 on the Test Dataset Using Six Machine